Efficient High-Dimensional Kernel k-Means++ with Random Projection
نویسندگان
چکیده
Using random projection, a method to speed up both kernel k-means and centroid initialization with k-means++ is proposed. We approximate the matrix distances in lower-dimensional space Rd before clustering motivated by upper error bounds. With projections, previous work on bounds for dot products an improved bound methods are considered k-means. The complexities Lloyd’s algorithm known be O(nkD) Θ(nkD), respectively, n being number of data points, dimensionality input feature vectors D clusters k. proposed reduces computational complexity computation from O(n2D) O(n2d) subsequent O(nkd). Our experiments demonstrate that speed-up reduced d=200 2 26 times very little performance degradation (less than one percent) general.
منابع مشابه
Scalable Kernel Clustering: Approximate Kernel k-means
Kernel-based clustering algorithms have the ability to capture the non-linear structure in real world data. Among various kernel-based clustering algorithms, kernel k -means has gained popularity due to its simple iterative nature and ease of implementation. However, its run-time complexity and memory footprint increase quadratically in terms of the size of the data set, and hence, large data s...
متن کاملTwo-dimensional random projection
As an alternative to adaptive nonlinear schemes for dimensionality reduction, linear random projection has recently proved to be a reliable means for high-dimensional data processing. Widespread application of conventional random projection in the context of image analysis is, however, mainly impeded by excessive computational and memory requirements. In this paper, a two-dimensional random pro...
متن کاملKernel Penalized K-means: A feature selection method based on Kernel K-means
Article history: Received 11 June 2014 Received in revised form 23 October 2014 Accepted 11 June 2015 Available online 19 June 2015
متن کاملRandom Projection for Fast and Efficient Multivariate Correlation Analysis of High-Dimensional Data: A New Approach
In recent years, the advent of great technological advances has produced a wealth of very high-dimensional data, and combining high-dimensional information from multiple sources is becoming increasingly important in an extending range of scientific disciplines. Partial Least Squares Correlation (PLSC) is a frequently used method for multivariate multimodal data integration. It is, however, comp...
متن کاملAlmost Random Projection Machine with Margin Maximization and Kernel Features
Almost Random Projection Machine (aRPM) is based on generation and filtering of useful features by linear projections in the original feature space and in various kernel spaces. Projections may be either random or guided by some heuristics, in both cases followed by estimation of relevance of each generated feature. Final results are in the simplest case obtained using simple voting, but linear...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Applied sciences
سال: 2021
ISSN: ['2076-3417']
DOI: https://doi.org/10.3390/app11156963